[Data Visualization] What - Data Abstraction
The Big Picture
- What data does the user see? (data abstraction)
- Why does the user use the system? (task abstraction)
- How are the visual encoding and interaction idioms constructed? (idiom abstraction)
What - Data Abstraction
Why Data Abstraction?
- There are an infinite number of datasets.
- Thus, it would be inefficient to design a visualization system for every dataset.
- We will categorize and characterize data types that a visualization system aims to visualize.
- Insights gained from such analysis can be used to design a visualization system in the future.
Why Data Semantics and Types?
1, 4.5, -3, 10001, 2, 0
- What does this sequence of six numbers mean?
- Multiple interpretations are possible.
Basil, 7, S, Pear
- Same for this record
- The type of the data is its structural or mathematical interpretation.
- The semantics of the data is its real-world meaning.
- Metadata: additional data required for correctly interpreting data
- Types
- Semantics
- Syntax of a data file (e.g., CSV, TSV, or JSON)
Data, Dataset, and Attributes
- We will learn three concepts: data types, dataset types, and attribute types.
- Data types: what kind of thing is the data?
- e.g., an item, a link, or an attribute
- Dataset types: how are these data types combined into a larger structure?
- e.g., a table, a tree, or a field of sampled values
- Attribute types: what kinds of mathematical operations are meaningful for an attribute?
- e.g., quantity, category, $\dots$
Data Types
- Five Basic Data Types: Items, Attributes, Links, Positions, and Grids
- An item is an individual entity that is discrete.
- e.g., a row in a simple table or a node in a network.
- An attribute is some specific property that can be measured, observed, or logged.
- Sometimes, called variable or dimension
- e.g., salary, price, or number of sales
- A link is a relationship between items typically within a network.
- e.g., marriage relationship
- A grid specifies the strategy for sampling continuous data in terms of both geometric and topological relationships between its cells.
- A position is spatial data, providing a location in two-dimensional (2D) or three-dimensional (3D) space.
- e.g., a latitude–longitude pair describing a location on the Earth’s surface
- e.g., three numbers specifying a location within the region of space measured by a medical scanner
Dataset Types
- Let’s combine these five basic data types.
- One of the most common dataset type is a table.
- A table dataset type includes item (rows) and attribute (columns) data types.
- A network dataset type consists of three data types: items (nodes), links (links), and attributes (attributes of node links).
- Four dataset types: tables, networks and trees, fields, and geometry
-
- clusters, sets, and lists
-
Tables
- A table is made up of rows and columns.
- Usually, 2D
- A row represents an item.
- A column represents an attribute.
- A cell is specified by the combination of a row and a column.
- e.g., stores a value specified by an item and an attibute.
- A multidimensional table has a more complex structure for indexing into a cell, with multiple keys.
Networks and Trees
- A network is made up of nodes and links and specifies the relationship between two or more nodes.
- Nodes and links can have attributes independently.
- Example: social network on Facebook
- Node: accounts (people, organizations, or pages)
- Link: friendship (or subscription)
- Node attributes: name, photo, website_url, …
- Link attributes: last interaction time, …
- Networks with hierarchical structure are more specifically called trees.
- In contrast to a general network, trees do not have cycles.
- Each child node has only one parent node pointing to it.
- Networks are sometimes called graphs.
- e.g., graph drawing and graph theory
- But, the term graphs is also used for charts.
- e.g., bar graph and line graph
- It is confusing. So, we will use the term charts for this
- Two popular visualizations for networks: a node-link diagram and an adjacency matrix.
- There are a lot of network visualizations!
Fields
- The field dataset type contains attribute values associated with cells.
- What is the difference between 2D tables and 2D fields?
- In a 2D field, each cell contains measurements or calculations from an continuous domain.
- So if you want, you can draw an infinite number of measurements!
- In a table, rows and columns are discrete.
- Consider a field dataset representing a medical scan of a human body.
- This is a 3D field, because our body is continuous.
- We can determine the resolution of the scan (i.e., granularity)
- A low resolution (a coarser grid): 64 * 64 * 64 cells
- A high resolution (a finer grid): 256 * 256 * 256 cells
- Since it is impossible to measure an infinite number of cells, sampling and interpolation techniques are important in the field dataset type.
- Sampling: how frequently to take the measurements?
- Interpolation: how to show values in between the sampled points in a way that does not mislead.
- Interpolating appropriately between the measurements allows you to reconstruct a new view of the data.
- Grid geometry: the location of cells in space
- Grid topology: how each cell connects with its neighboring cells
SciVis vs InfoVis
- If we want to visualize a 2D field, an obvious choice for visual encoding would be to keep the spatialization of the data in the visualization.
- e.g., longitude -> horizontal position, latitude -> vertical position
- Scientific visualization (SciVis) is concerned with situations where spatial position is given with the dataset.
- Information visualization (InfoVis) is concerned with situations where the use of space in a visual encoding is chosen by the designer.
Geometry
- The geometry dataset type specifies information about the shape of items with explicit spatial positions.
- Items + positions
- e.g., points, one-dimensional lines or curves, or 2D surfaces or regions, or 3D volumes
- e.g., cartography
Other Dataset Types
- Set: an unordered group of items
- List: an ordered group of items
- Cluster: grouping based on attribute similarity, where items within a cluster are more similar to each other than to ones in another cluster
InfoVis Subfields
- (Dataset type) + “visualization”
- Table visualization
- Network visualization
- Field visualization (usually, vector or tensor visualization in SciVis)
- Set visualization
- Cluster visualization
- $\dots$
Dataset Availability
- Any of dataset types can be static or dynamic.
- The default approach to visualization assumes that the entire dataset is available all at once, as a static file (static datasets, offline).
- Recently, it becomes more frequent to visualize dynamic datasets that change over the course of the visualization session (dynamic datasets, online).
- e.g., monitoring, …
Attribute Type
- The type of an attribute
- Categorical data do not have an implicit ordering (sometimes, nominal or qualitative).
- But they often have hierarchy structure.
- e.g., movie genres, file types, and city names
- Operators: == and !=
- Ordered data have an implicit ordering.
- Ordinal data have an ordering but artihmetic is not meaningful (e.g., shirt sizes, ranks)
- Quantitative data have an ordering and arithmetic makes sense (e.g., length, stock prices)
- Operators: ==, !=, >, <, and (+ and – only for quantitative data)
- Quantitative data can be further divided into two types: interval and ratio.
- In interval data, distances are meaningful but there is no absolute zero.
- e.g., temperature in Celsius or Fahrenheit
- Multiplication and division does not make sense. 60°C is not twice as hot as 30°C.
- In ratio data, distances are meaningful and there is an absolute zero.
- e.g., temperature in Kelvin
- 60°K is twice as hot as 30°K.
Summary: Attribute Types
- Categorical (sometimes nominal or qualitative): movie genres, file types, …
- Operators: ==, !=
- Ordinal: shirt sizes, ranks, …
- Operators: ==, !=, <, >
- Interval: temperature in Celsius, …
- Operators: ==, !=, <, >, +, -
- Ratio: temperature in Kelvin, number of people, …
- Operators: ==, !=, <, >, +, -, *, /
- For ordered data, we can consisder the ordering direction.
- i.e., where is the origin?
- Sequential: there is a homogeneous range from a minimum to a maximum value, such as height.
- Diverging: data can be deconstructed into two sequences pointing in opposite directions that meet at a common zero point, such as elevation.
- Cyclic: the values wrap around back to a starting point rather than continuing to increase indefinitely, such as the day of the week.
- Color schemes for ordering directions
Semantics
- Two types of attribute semantics: key and value
- A key attribute acts as an index that is used to look up value attributes.
- Key: your student ID
- Value: your name
- Keys are sometimes called independent attributes or dimensions, and values are sometimes called dependent attributes or measures.
- Types and semantics are cross-cutting.
- A categorical attribute can be value attributes, and a quantitative attribute can be key attributes.
- Name is a categorical attribute that might appear to be a reasonable key at first.
- But it is not a good choice since there are two people (Amy) have the same name.
- The quantitative attribute of Age and the ordinal attribute of Shirt Size have many duplicates so they are not a good choice.
- ID can serve as a key attribute.
- For multidimensional tables or fields, multiple keys are required to look up an item.
- The combination of all keys must be unique for each item, even though an individual key attribute may contain duplicates!
- (ID, Name)
- Multidimensional: data have multiple keys.
- one-dimensional, two-dimensional, …
- Multivariate: data have multiple values.
- univariate, bivariate, …
- Many people do not separate these two terms but they are DIFFERENT!
- 다차원 vs 다변량
- Suppose you measured the temperature of a 3D space.
- You have three keys: x, y, and z
- For each cell, you have one value (temperature)
- So, you have 3 dimensions and one attribute for each cell!
- Three-dimensional univariate dataset!
- Keys: one-dimensional, two-dimensional, three-dimensional, …, multidimensional
- Values: univariate (scalar), bivariate (vector), trivariate (tensor), …, multivariate
- Measuring the wind direction at some locations in a region: twodimensional (lat and long) vector fields (the direction of the wind)
- Can you imagine a 4-dimensional univariate dataset?
댓글남기기